Welcome to today’s Coffee & Coding session. This is a follow-up session after our Data Byte series. Today, we will explore how to work with APIs in R using the NHS Open Data Portal as our example. APIs (Application Programming Interfaces) allow us to access data from various web services, making them a powerful tool for data analysis and application development. Additionally, we will cover how to create your own API in R using plumber.

1 Introduction to APIs

1.1 What is an API?

An API is a communication layer that enables different systems to interact without needing to know the details of each other. This session focuses on web APIs.

1.2 Why use APIs?

  • Access to data
  • Automate data retrieval and processing
  • Integration with web services and third-party applications

1.3 Install required packages

# install.packages("httr2")
# install.packages("jsonlite")
# install.packages("dplyr")
# install.packages("data.table")
# install.packages("devtools")
# install.packages("plumber")
# install.packages("stringr")

library(httr2)
library(jsonlite)
library(dplyr)
library(data.table)
library(devtools)
library(plumber)
library(stringr)
library(plotly)

httr2 is an R package designed for making HTTP requests and handling responses. - Simplified syntax for constructing HTTP requests - Support for common HTTP methods (GET, POST, PUT, DELETE) - Tools for handling authentication, including OAuth. - Functions for managing request headers, query parameters, and body content. - Capabilities for processing and parsing HTTP responses.

jsonlite is an R packages that working with JSON data. - Functions for parsing JSON string into R objects (fromJSON) - Functions for converting R objects into JSON string (toJSON) - Support for complex data structures, including nested lists and data frames - Options for pretty-printing and compacting JSON output - Compatibility with streaming JSON data for larlge datasets

plumber enables the creating of RESTful APIs from R scripts. It allows users to expose R functions as web services, making it easy to integrate R with web applications. - Simple annotations to define API endpointss - Support for HTML methods (GET, POST etc) - Tools for managing API documentation and testing endpoints.

2 Making API requests

2.1 Understanding the Endpoint

An endpoint is a specific URL that allows you to access a particular part of an API. For example, an API might have different endpoints for retrieving data, submitting data, or updating data. The structure of an endpoint typically includes the base URL and additional parameters or paths that specify the exact resource you want to interact with. It connects API clients and servers.

The Open Data Portal (ODP) uses CKAN (Comprehensive Knowledge Archive Network), an open-source data portal designed for the storage and distribution of open data. To learn how to interact with the NBSBSA ODP site, you can refer to the CKAN API guide available here.

You can find useful example code in here

# Define the base URL and endpoint 
# To get a full JSON representation of a resource 

base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_show_method <- "package_show?id="
dataset_id <- "english-prescribing-data-epd"


# Combine the base URL and endpoint
url <- paste0(base_endpoint, package_show_method, dataset_id)

print(url) # We can check this url
## [1] "https://opendata.nhsbsa.net/api/3/action/package_show?id=english-prescribing-data-epd"
# Make the GET request using httr2
# Sends the request to the API and stores the response. 
response <- request(url) |> 
  req_perform()

if (resp_status(response) == 200) { # or response$status_code 
  # Parse the content of the response
  content <- resp_body_string(response)
  data <- fromJSON(content) # We can check data
  # Save this as data frame
  resource_table <- data$result$resources
  # tail(resource_table)
} else {
  stop("API request failed with status: ", resp_status(response))
}

content refers to the body of the response from the API.resp_body_json directly parse raw byte data into JSON object. It stores in the data variable.

2.2 Querying

From data$size, we can see that each English Prescribing Dataset (EPD) is about 6GB. We can construct SQL queries within the API to download smaller, more focused datasets by selecting only the columns of interest. By utilising the datastore_search_sql? endpoint and specifying the resource_id, we can run SQL queries to extract specific data.

For example, to extract data from the April 2024 data set, specially focusing on BNF chapter 5.1 (Antibacterial drugs) at the Integrated Care Board (ICB) level, and create a bar chart for comparison, we can use the following SQL query.

sql_action_method = "datastore_search_sql?"

# Pull the latest month antibiotic drug prescription data by ICB.
resource_id = "EPD_202404"

# Build our query which will feed after sql_action_method
tmp_query <- paste0(
    "
  SELECT
      YEAR_MONTH,
      REGIONAL_OFFICE_NAME,
      ICB_NAME,
      BNF_CHAPTER_PLUS_CODE,
      COUNT(ITEMS) as ITEM_COUNT
  FROM `",
    resource_id, "`
  WHERE
    BNF_CHEMICAL_SUBSTANCE LIKE '0501%'
  AND ICB_NAME != 'UNIDENTIFIED'  
  GROUP BY
    YEAR_MONTH,
    REGIONAL_OFFICE_NAME,
    BNF_CHAPTER_PLUS_CODE,
    ICB_NAME
  "
  )

  # Build temporary API call
  tmp_api_call <- paste0(
    base_endpoint,
    sql_action_method,
    "resource_id=",
    resource_id,
    "&",
    "sql=",
    URLencode(tmp_query) # Encode spaces in the url
  )
  
  print(tmp_api_call)
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202404&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20REGIONAL_OFFICE_NAME,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202404%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501%25'%0A%20%20AND%20ICB_NAME%20!=%20'UNIDENTIFIED'%20%20%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20REGIONAL_OFFICE_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20ICB_NAME%0A%20%20"
  # We can check in browser
  

  # Grab the response JSON as a temporary list
  tmp_response <- jsonlite::fromJSON(tmp_api_call)
  

  # Extract records in the response to a temporary dataframe
  antibiotic_icb_df <- data.table::as.data.table(tmp_response$result$result$records)
  
  
  # Plot this 
  # Create the Plotly bar chart
fig <- plot_ly(data = antibiotic_icb_df, 
               x = ~ICB_NAME, 
               y = ~ITEM_COUNT, 
               type = 'bar', 
               text = ~paste(REGIONAL_OFFICE_NAME, "<br>Items:", ITEM_COUNT),
               marker = list(color = 'rgba(50, 171, 96, 0.6)',
                             line = list(color = 'rgba(50, 171, 96, 1.0)',
                                         width = 1.5))) |> 
  layout(title = "ICBs with antibacterial drugs items",
         xaxis = list(title = "ICB Name", tickangle = -45),
         yaxis = list(title = "Item Count"),
         margin = list(b = 150),
         hovermode = "closest")

# Display the chart
fig

URLencode is a function in R that is used to encode characters in a URL so that they can be transmitted over the internet. This function ensures that special characters in URLs are properly encoded to avoid issues with URL parsing and transmission.

For example: - A space character () will be encoded as %20. - An apostrophe (') will be encoded as %27. - The greater than or equal to sign (>=) will be encoded as %3E%3D etc.

URLencode is used to encode the SQL query string before appending it to the URL for the API call. This ensures that any special characters in the SQL query are properly encoded.

2.3 Query - EPD with multiple months

Let’s expand our query to loop through multiple months of data. We will extract monthly dispensed Amoxicillin (0501013B0) items in one random GP practice in North East & Yorkshire.

# extract multiple months for GP practices in North East and Yorkshire region

sql_action_method = "datastore_search_sql?"

# start_month & end_month define
start_month = "EPD_202305"
end_month = "EPD_202404"

# Filter within specific range from resources_table
resource_ids = resource_table |> 
  filter(name >= start_month & name<= end_month) |> 
  pull(name)

# check 
print(resource_ids)
##  [1] "EPD_202305" "EPD_202306" "EPD_202307" "EPD_202308" "EPD_202309"
##  [6] "EPD_202310" "EPD_202311" "EPD_202312" "EPD_202401" "EPD_202402"
## [11] "EPD_202403" "EPD_202404"
lane_end_df <- data.frame()

for (month in resource_ids){
  tmp_query <- paste0(
    "
  SELECT
      YEAR_MONTH,
      ICB_NAME,
      PRACTICE_NAME,
      BNF_CHAPTER_PLUS_CODE,
      CHEMICAL_SUBSTANCE_BNF_DESCR,
      COUNT(ITEMS) as ITEM_COUNT
  FROM `",
    month, "`
  WHERE
    BNF_CHEMICAL_SUBSTANCE LIKE '0501013B%'
  AND PRACTICE_NAME LIKE 'LANE END SURGERY'
  GROUP BY
    YEAR_MONTH,
    ICB_NAME,
    BNF_CHAPTER_PLUS_CODE,
    CHEMICAL_SUBSTANCE_BNF_DESCR,
    PRACTICE_NAME
  "
  )

  # Build temporary API call
  tmp_api_call <- paste0(
    base_endpoint,
    sql_action_method,
    "resource_id=",
    month,
    "&",
    "sql=",
    URLencode(tmp_query) # Encode spaces in the url
  )
  
  print(tmp_api_call)
  
  tmp_response <- jsonlite::fromJSON(tmp_api_call)
  
  
  # Extract records in the response to a temporary dataframe
  tmp_df <- data.table::as.data.table(tmp_response$result$result$records) # we can look at our JSON structure 

  # Bind the temporary data to the main dataframe
  lane_end_df <- dplyr::bind_rows(lane_end_df, tmp_df)

}
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202305&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202305%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202306&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202306%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202307&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202307%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202308&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202308%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202309&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202309%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202310&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202310%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202311&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202311%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202312&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202312%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202401&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202401%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202402&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202402%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202403&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202403%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202404&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202404%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"

2.4 Querying FOI

Let’s revisit API guide in the KCAN page. We will use package_list to see what other data sets are available in our Open Data Portal site.

base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_list_method <- "package_list" 

# Combine the base URL and endpoint for listing packages 
url_list <- paste0(base_endpoint, package_list_method)

# Perform a request to fetch the results
response_list <- request(url_list) |> 
  req_perform() 

# Check the status of the response, 200 means success
if (resp_status(response_list) == 200) {
  # Parse the content of the response directly as JSON using httr2
  data_list <- resp_body_json(response_list)
  
  # Extract the list of dataset IDs each FOI is separate ids
  dataset_ids <- data_list$result
  
  # Filter the dataset IDs to include only those starting with "foi"
  foi_ids <- dataset_ids[grepl("^foi", dataset_ids)]
  
  # change foi_ids a vector to add to data frame
  foi_ids <- unlist(foi_ids)
  
  foi_ids_df <- data.frame(ID = foi_ids, stringsAsFactors = FALSE)
  
  View(foi_ids_df)
} else {
  cat("Failed to retrieve data: HTTP status", resp_status(response_list), "\n")
}

2.5 Query FOI - Your turn!

Can you pick any foi from foi_ids data frame and extract what was their request and response?

# base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
# package_show_method <- "package_show?id="
# dataset_id <- "foi-_____"
# 
# # Combine the base URL and endpoint
# url <- paste0(base_endpoint, package_show_method, dataset_id)
# 
# print(url)
# 
# # url should look like this: https://opendata.nhsbsa.net/api/3/action/package_show?id=foi-01040
# 
# # Perform a request to fetch the results using httr2
# response <- request(___) |> 
#   httr2::_____() #httr2 function
# 
# # Check the status of the response
# if (resp_status(response) == ___) { # Success status should return this numeric value
#   # Parse the content of the response directly as JSON
#   data <- httr2::___(___) #httr2 function family of : resp_*
#   
#   # Extract the notes part from JSON
#   notes <- data$result$___
#   
#   # Find the index of the start of the response part (most case start with \n#Response etc)
#   response_start <- regexpr("\n#", notes)
#   
#   if (response_start[1] != -1) { # if response find
#     request_part <- substr(notes, 1, response_start[1] - 1) # Request will be the all the text until Response
#     response_part <- substr(notes, response_start[1], nchar(notes)) # All response
#   } else {
#     request_part <- notes
#     response_part <- ""
#   }
#   
#   
#   # Extract tags
#   tags <- data$result$tags
#   
#   tag_names <- sapply(tags, function(tag) tag$name)
#   concatenated_tags <- paste(tag_names, collapse = ", ")
#   
#   
#   
#   # Clean text by removing all characters except alphanumeric, spaces, and URL patterns
#   clean_text <- function(text) {
#     text <- gsub("(?<!http:|https:)[^[:alnum:] [:space:] /:._-]", "", text, perl = TRUE)
#     return(text)
#   }
#   
#   request_part_clean <- clean_text(request_part)
#   response_part_clean <- clean_text(response_part)
#   
#   # Display the cleaned parts & tags
#   cat("Tags:\n", concatenated_tags, "\n")
#   cat("Request Part:\n", request_part_clean, "\n\n")
#   cat("Response Part:\n", response_part_clean, "\n")
# } else {
#   cat("Failed to retrieve data: HTTP status", resp_status(response), "\n")
# }

Example answer:

base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_show_method <- "package_show?id="
dataset_id <- "foi-01960"

# Combine the base URL and endpoint
url <- paste0(base_endpoint, package_show_method, dataset_id)

print(url)
## [1] "https://opendata.nhsbsa.net/api/3/action/package_show?id=foi-01960"
# url should look like this: https://opendata.nhsbsa.net/api/3/action/package_show?id=foi-01040

# Perform a request to fetch the results using httr2
response <- request(url) |> 
  req_perform() #httr2 function

# Check the status of the response
if (resp_status(response) == 200) { # Success status should return this numeric value
  # Parse the content of the response directly as JSON
  data <- resp_body_json(response) #httr2 function family of : resp_*
  
  # Extract the notes part from JSON
  notes <- data$result$notes
  
  # Find the index of the start of the response part (most case start with \n#Response etc)
  response_start <- regexpr("\n#", notes)
  
  if (response_start[1] != -1) { # if response find
    request_part <- substr(notes, 1, response_start[1] - 1) # Request will be the all the text until Response
    response_part <- substr(notes, response_start[1], nchar(notes)) # All response
  } else {
    request_part <- notes
    response_part <- ""
  }
  
  
  # Extract tags
  tags <- data$result$tags
  
  tag_names <- sapply(tags, function(tag) tag$name)
  concatenated_tags <- paste(tag_names, collapse = ", ")
  
  
  
  # Clean text by removing all characters except alphanumeric, spaces, and URL patterns
  clean_text <- function(text) {
    text <- gsub("(?<!http:|https:)[^[:alnum:] [:space:] /:._-]", "", text, perl = TRUE)
    return(text)
  }
  
  request_part_clean <- clean_text(request_part)
  response_part_clean <- clean_text(response_part)
  
  # Display the cleaned parts & tags
  cat("Tags:\n", concatenated_tags, "\n")
  cat("Request Part:\n", request_part_clean, "\n\n")
  cat("Response Part:\n", response_part_clean, "\n")
} else {
  cat("Failed to retrieve data: HTTP status", resp_status(response), "\n")
}
## Tags:
##  HR 
## Request Part:
##  Thank you for your request for information about the following: 
## 
## I would like to know:
## 
## 1    How many staff the  NHS Business Services Authority has.
## 
## 2    How many staff in the  NHS Business Services Authority have died each year for the last 5 years in RTCs.
## 
## 3    Of those I would like to know if they were working at the time commuting to work or off duty.
## It would also be good to know what percentage of the total deaths were in a RTC for the police it was 53.
## 
## 4    I would also like to know if you have a Road safety strategy like the MoD Defence Road Safety Strategy - GOV.UK www.gov.uk and if you have guidance on managing fatigue along the lines of what the ORR are currently consulting on for rial staff Managing rail staff fatigue: draft version for consultation orr.gov.uk.
## 
## Your request was received on 23 May 2024 and I am dealing with it under the terms of the Freedom of Information Act 2000. 
##  
## 
## Response Part:
##  
## Response 
## 
## 1.   4827
## 
## 2.   The NHS Business Authority have had no deaths in service caused by Road Traffic Collisions 
## 
## 3.   N/A
## 
## 4.   The NHS Business Authority does not have a road safety strategy
## 
## Please note that this request and our response is published on our Freedom of Information disclosure log at: 
## 
## https://opendata.nhsbsa.net/dataset/foi-01960 
## 
## Data Queries 
## 
## If you have any queries regarding the data provided or if you plan on publishing the data please contact dataandinsightsupportnhsbsa.nhs.uk / foidatanhsbsa.nhs.uk / foirequestsnhsbsa.nhs.uk ensuring you quote the above reference. This is important to ensure that the figures are not misunderstood or misrepresented. 
## 
## If you plan on producing a press or broadcast story based upon the data please contact communicationsteamnhsbsa.nhs.uk This is important to ensure that the figures are not misunderstood or misrepresented. 
## 
## The information supplied to you continues to be protected by the Copyright Designs and Patents Act 1988 and is subject to NHSBSA copyright. This information is licenced under the terms of the Open Government Licence detailed at: 
## http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/  
## 
## Should you wish to re-use the information you must include the following statement: 
## NHSBSA Copyright 2024 This information is licenced under the terms of the Open Government Licence: 
## 
## http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/  
## 
## Failure to do so is a breach of the terms of the licence. 
## 
## Information you receive which is not subject to NHSBSA Copyright continues to be protected by the copyright of the person or organisation from which the information originated.  Please obtain their permission before reproducing any third party non NHSBSA Copyright information.

2.6 Query FOI - Run all FOIs

Similar to EPD dataset, we can expand the process of extracting all data from FOI datasets. This allows us to tidy the text further and conduct various analyses, such as keyword extraction and other NLP (Natural Language Processing) techniques.

base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_list_method <- "package_list" 
url <- paste0(base_endpoint, package_list_method)


response_list <- request(url) |> 
  req_perform() # from here we want to extract only foi-xxxxx etc


if (resp_status(response_list) == 200) {
  
  data_list <- resp_body_json(response_list) # pass it as JSON
  
  # Extract the list of dataset IDs each FOI is separate ids
  dataset_ids <- data_list$result
  
  # Using grepl to extract id starts with foi 
  foi_ids <- dataset_ids[grepl("^foi", dataset_ids)] 
  
  
   # Select the top 30 foi (for demonstration, as it takes a while if we run all)
  top_foi_ids <- head(foi_ids, 30)
  
  
  # Create empty data frame which will hold our informations 
  results_df <- data.frame(ID = character(),
                           Tags = character(),
                           Request = character(),
                           Response = character(),
                           stringsAsFactors = FALSE)
  
  # Loop through each FOI ID to fetch and process the data
  for (dataset_id in top_foi_ids) {
    # Combine the base URL and endpoint for each dataset 
    url <- paste0(base_endpoint, "package_show?id=", dataset_id)
    
    # Perform a request to fetch the results using httr2::req_perform
    response <- request(url) |> 
      httr2::req_perform() 
    
    if (resp_status(response) == 200) { 
      
      # Parse the content of the response directly as JSON
      data <- httr2::resp_body_json(response) # httr2 function family of : resp_*
      
      # Extract the notes part from JSON
      notes <- data$result$notes
      
      # Find the index of the start of the response part (most case start with \n#Response etc)
      response_start <- regexpr("\n#", notes)
      
      if (response_start[1] != -1) { # if response found
        request_part <- substr(notes, 1, response_start[1] - 1) # Request will be all the text until Response
        response_part <- substr(notes, response_start[1], nchar(notes)) # All response
      } else {
        request_part <- notes
        response_part <- ""
      }
      
      # Extract tags
      tags <- data$result$tags
      tag_names <- sapply(tags, function(tag) tag$name)
      concatenated_tags <- paste(tag_names, collapse = ", ")
      
      # Clean text by removing all characters except alphanumeric, spaces, and URL patterns
      clean_text <- function(text) {
        text <- gsub("(?<!http:|https:)[^[:alnum:] [:space:] /:._-]", "", text, perl = TRUE)
        return(text)
      }
      
      request_part_clean <- clean_text(request_part)
      response_part_clean <- clean_text(response_part)
      
      # Append the results to the data frame
      # each result will be row binding
      results_df <- rbind(results_df, data.frame(ID = dataset_id,
                                                 Tags = concatenated_tags,
                                                 Request = request_part_clean,
                                                 Response = response_part_clean,
                                                 stringsAsFactors = FALSE))
    } else {
      cat("Failed to retrieve data for ID:", dataset_id, "- HTTP status", resp_status(response), "\n")
    }
  }
  
  # View the results data frame
  View(results_df)
} else {
  cat("Failed to retrieve data: HTTP status", resp_status(response_list), "\n")
}

2.6.1 Plumber

Plumber is an R package designed to make it easy to create web APIs using R. It allows R users to expose R functions as API endpoints with minimal effort, transforming R scripts into HTTP services. Plumber is particularly useful if you have your analytical models or data processing tasks as web services without needing to learn a new programming language or framework.

Define your API by adding special comments, or “annotations,” to your R functions. These annotations specify how each function should behave when it receives an HTTP request. Plumber supports various HTTP methods (such as GET, POST, PUT, DELETE), making it flexible for different types of web services. - Plumber requires annotation starts with #* - For example #* @get defines GET endpoint like our example code plumber.r

Plumber APIs can handle inputs and outputs in various formats, including JSON, which is commonly used for data exchange on the web. This makes it easy to integrate R with other web-based applications and services.

Once defined, a plumber API can be run locally for testing or deployed to a server for broader access. Deployment can be done on cloud platforms, dedicated servers, or through services like RStudio Connect, which provide additional features such as user authentication, secure access, and easy management of deployed APIs.

Overall, Plumber enables R users to leverage their existing R code and expertise to create powerful, interactive, and shareable web services, facilitating the integration of R’s analytical capabilities into a wide range of applications.

2.6.2 Example

Imagine a regional health authority that wants to fetch antibiotic prescription data for April 2024 (202404) and visualize it in their dashboard to monitor patterns across ICB.

We will use Posit Connect to achieve this. Here’s the process:

  1. Convert Code to Plumber API Script: Take our initial example code and convert it into a Plumber API script (e.g., plumber.R):

    r <- plumber::plumb(file = “./2024-06-25-API/plumber.R”) r$run(port = 8000)

  2. Test Locally: Run the API locally to ensure it works correctly.

  3. Publish to Posit Connect: Once satisfied with the local testing, publish the Plumber API script to Posit Connect. This platform will provide a URL for your API.

  4. Share the API URL: Share the provided URL so others can access your API. For example, the URL might look like this: https://connect.nhsbsa-analytics.com/antibiotic_data?resource_id=202404

Using this URL, users can access the API and create charts to visualize the data as needed.


2.7 Summary

In this session, we explored how to work with APIs in R, focusing on accessing data from the NHS Open Data Portal and creating our own APIs using the R Plumber package. We learned how to make HTTP requests, parse JSON responses, and query specific datasets. Additionally, we covered how to use Plumber to expose R functions as API endpoints, facilitating the integration of R’s analytical capabilities into web services. By leveraging these tools, you can enhance your data workflows and create interactive, shareable web services.